Localization on a-priori information of plane extraction

Localization constitutes a critical challenge for autonomous mobile robots, with flattened walls serving as a fundamental reference for indoor localization. In numerous scenarios, prior knowledge of a wall’s surface plane is available, such as planes in building information modeling (BIM) systems. This article presents a localization technique based on a-priori plane point cloud extraction. The position and pose of the mobile robot are estimated through real-time multi-plane constraints. An extended image coordinate system is proposed to represent any planes in space and establish correspondences between visible planes and those in the world coordinate system. Potentially visible points representing the constrained plane in the real-time point cloud are filtered using the filter region of interest (ROI), derived from the theoretical visible plane region within the extended image coordinate system. The number of points representing the plane influences the calculation weight in the multi-plane localization approach. Experimental validation of the proposed localization method demonstrates its allowance for redundancy in initial position and pose error.


Introduction
The localization problems are of paramount importance for mobile robotics, particularly for indoor robots operating in environments where external navigational aids such as GPS prove to be unreliable or unavailable. For example, painting robots localize themselves to apply paint on target surfaces within multiple rooms [1,2], while inspecting power distribution cabinets necessitates localization to identify the intended targets [3]. Furthermore, localization serves as an integral component of simultaneous localization and mapping (SLAM), a widely employed technique for robotic navigation in uncharted environments [4][5][6].
The challenges associated with mobile robot localization are heightened in settings where distinctive features are either sparse or repetitive, such as indoor construction sites [7,8]. In these scenarios, feature-based localization methods face significant challenges due to the lack of unique and easily identifiable features.
In contrast, indoor construction sites often possess structured information in the form of blueprints such as Building Information Model (BIM), which detail the locations of walls and other architectural elements [9]. This a-priori knowledge presents an opportunity to enhance a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 conventional feature-based localization methods by integrating the structured parameters into the localization procedure.
The development of innovative localization techniques that can effectively utilize structured information from blueprints, has emerged as a challenge for robotics developers. These methods aim to augment the localization performance of mobile robots in indoor construction sites and other environments lacking distinct features, yet possessing adequate plane characteristics for localization purposes.

Review of related works
Solving the localization problem is a prerequisite for mobile robots to perform tasks effectively. Robots can utilize various methods for localization, such as magnetic [10] or sonar-based techniques [11]. Integrating historical signals and fusing data from multiple sensors are also crucial to enhance localization accuracy and robustness [12]. As research progresses, innovative approaches will continue to emerge, addressing diverse localization scenarios and advancing mobile robot capabilities.
The utilization of visual information for mobile robot localization is a crucial aspect of navigation, and it has been the subject of extensive research. Two primary categories of methods have emerged, including direct [13,14] and indirect methods [15]. Direct methods involve the analysis of raw sensor data, such as depth [16,17] or pixel information [18], to estimate the robot's position and orientation within its environment. Some direct methods leverage the initial position and pose estimates to enhance localization performance [19]. Conversely, indirect methods concentrate on extracting distinct features from the environment and employing these features for localization purposes [20]. Indirect methods of localization typically involve extracting features from the sensor data, which are subsequently utilized to estimate the robot's pose [21]. Feature-based methods can offer greater efficiency compared to direct methods, as they operate on a diminished set of data points [22][23][24]. However, the efficacy of indirect methods can be influenced by the quality and uniqueness of the extracted features, which might perform difficult to acquire in environments characterized by featureless planes or repetitive structures.
A widely employed primitive in indoor environments is the plane primitive, which is frequently utilized in the localization process. Furthermore, planes typically serve as the basis for measurements [25,26]. A pi-SLAM [27] introduced a real-time dense planar LiDAR-based localization system that employs planes as landmarks. Zhou et al. [28] outperformed localization methods using planes, lines, and cylinders. Wen et al. [29] utilized the absolute ground plane to constrain vertical pose estimation, subsequently reducing the estimation error. A point-plane-object localization system was proposed for semantic map reconstruction [30], which demonstrated effective localization in indoor scenarios. However, the method in [30] necessitates strict criteria, such as planes being parallel or perpendicular to one another. Additionally, in both [27,28], planes are identified solely by analyzing point clouds. Moreover, the extracted planes, represented by point clouds, demand an extra registration process to incorporate the planes into the existing map. In low-texture environments, robots can leverage intersection lines between walls and floors to model walls and scenes [31]. However, this method requires that the intersection lines be visible. Region-of-interest (ROI) approaches are well-suited for extracting relevant point clouds [32,33]. When planar surfaces within a scene lack notable distinctions or exhibit visual similarity, a priori (ROI) information can be employed to segment and differentiate these planes [34]. The segmented planes facilitate matching the observed planes with structured information from blueprints or other sources, thereby improving localization performance.
Building Information Modeling (BIM) technology enhances the informativeness, intelligence, and eco-friendliness of the construction industry [35]. Numerous studies have explored the integration of a-priori BIM blueprint data with mobile robot localization [1,9,36]. The apriori plane information of a construction site is readily accessible from the digital model [37]. The a priori known planes from BIM models boast extensive applicability. These planes are extracted to facilitate the detection, identification, and localization of lighting elements [38]. Autonomous monitoring of construction progress is achieved by comparing BIM models to the actual, acquired photogrammetric point clouds [39]. Zhao et al. [9] proposed a featurebased method to localize mobile robots by matching features between online information and the geometric and semantic information retrieved from BIM. However, this method is constrained by feature qualities and is ill-suited for featureless scenarios. Schaub et al. [36] localized mobile robots by matching the entire point cloud in the scene to the complete point cloud generated by BIM. Nonetheless, matching entire point clouds necessitates high-quality online point cloud observations. Drawing from these localization studies, it becomes apparent that utilizing only a portion of the online point cloud in a scene is sufficient for a mobile robot to determine its location. By combining this with the commonly present plane primitives in indoor environments, mobile robots can leverage multiple a-priori planes for localization.

Objectives and scope of this study
In light of the deficiencies inherent in existing localization methods, this paper proposes a novel localization approach that utilizes a-priori known planes in the absolute environment. The main idea of this method involves using the mobile robot's coarse initial pose and the absolute position of planes in the map to extract planes from the online-obtained point cloud via the region of interest (ROI) method. Subsequently, by comparing plane poses representedin the camera coordinate system and ground coordinate system, the refined pose and position of the robot can be determined. This method does not require the visibility of intersection lines and vertices of the planes.
The main contributions of this paper include: • An approach that employs a-priori absolute reference planes and the mobile robot's coarse initial pose to extract plane point clouds from online-obtained point cloud.
• An extended image coordinate system that aids in locating invisible primitives and determining the visibility of any reference plane.
• A multi-plane localization algorithm that calculates the robot's pose and position using the a priori planes' absolute poses and the observed planes within view.
The remainder of the paper is organized as follows. Section 2 presents the problem formulation of localization with multiple planes. Section 3 outlines the method details of localization, with Section 3.1 introducing the method of representing arbitrary planes in the extended image system to help determine whether reference planes are visible. Section 3.2 proposes the algorithm for localization with multiple planes, and Section 3.3 provides an overview of the proposed methods for localization. Section 4 discusses experimental tests on algorithm feasibility. Lastly, Section 5 and Section 6 present the discussion and conclusion, respectively. mobile robot moves within the x G Oy G plane of the ground coordinate system O G x G y G z G . If the coordinates, P r (x,y,z), depict the robot's position, the z-axis component is zero.
When considering movement between floors or uneven terrain, the ground coordinate system, O G x G y G z G , does not coincide with the world coordinate system O W x W y W z W . The homogeneous transformation matrix from the ground coordinate system, O G x G y G z G , to the world coordinate system, O W x W y W z W , is defined as T W G . The ground coordinate system serves as the primary reference for the robot's motion. In this paper, the ground coordinate system is assumed to be unique. If multiple ground coordinate systems exist due to uneven terrain, the proposed method remains applicable by replacing the transformation matrix ðT W G Þ i . A robot coordinate system, O R x R y R z R , and the corresponding homogeneous transformation matrix, T G R , are defined to easily express surrounding objects' relative positions to the mobile robot. The robot is situated at the origin of the robot coordinate system. The z R axis of O R x- The relative spatial relationships of the aforementioned coordinate systems are illustrated in Fig 1(A).
As a mobile robot moves on the ground, its pose and position can be represented by three parameters, x, y and φ. The parameters x and y denote the mobile robot's coordinates on the ground plane x G Oy G . The parameter φ represents the mobile robot's pose angle, with the quantitative value defined as the included angle between the robot's forward direction and the x-axis direction vector of the ground coordinate system The homogeneous transformation matrix, T G R , can be expressed by pose parameters (x,y,φ), as shown in Eq (1).
Suppose the homogenous coordinates in the robot coordinate system are p R , and the corresponding homogenous coordinates of P R in the ground coordinate system are p G . The

PLOS ONE
transformation equation is illustrated in Eq (2).
where the transformation matrix T G R is defined by Eq (1). Cameras are fixed on the mobile robot. The camera coordinate system is a 3-D coordinate system established with the camera placed at the origin. The conventional image coordinate system is a 2-D coordinate system, representing the situation the image locates. Fig 1(b) shows the arrangement of camera coordinate system and the image coordinate system.
There is a point in the camera coordinate system, P(x P ,y P ,z P ,1), and its corresponding coordinates in the conventional image coordinate system, P(u P ,v P ,1). Above coordinates are represented in homogenous coordinates. The transformation equation from the camera coordinates to the conventional image coordinates can be expressed as follow where f x ,f y ,u 0 ,v 0 represents the internal parameters of the camera.
Assume that the homogeneous representation of the plane, π, is defined as Eq (4).
where the elements, π 1 ,π 2 ,π 3 ,π 4 , separately represent the coefficients of the plane π. The generalized Equation of a plane can be expressed using the homogeneous representation, as shown in Eq (5).
Consider a plane π i with its representation in the ground coordinate system as p G i . The plane is directly observed by the camera, and its representation in the camera coordinate system is π C . The transformation equation of the two-plane representations is shown in Eq (6).
where T G R represents the transformation matrix from the robot coordinate system to the ground coordinate system, and T R C represents the transformation matrix from the camera coordinate system to the robot coordinate system. The derivation of Eq (6) can be found in S1 Appendix.
The matrix T G R denotes the homogeneous transformation matrix from the robot coordinate system to the ground coordinate system. Its structure is determined by the pose vector, (x,y,φ), according to Eq (1).
The matrix T R C denotes the homogeneous transformation matrix from the camera coordinate system to the robot coordinate system. Its structure is determined by the mechanical connection geometry between the camera and the robot. When the camera is mounted on the mobile robot, the matrix T R C is considered invariant. Since Eq (6) is established by the vision system's observation, and the pose vector serves as the unknown variables, it describes a position and pose constraint for the mobile robot.
Multiple constraints in the form of Eq (6) constitute a constraint equation system. The position and pose of the mobile robot are estimated by solving this constraint equation system.
In Eq (7), multiple planes are observed simultaneously by the camera system, and the constraint equation system can be expressed as follows.
In Eq (7), S represents the constraint equation system, (x,y,φ) represents the position and pose estimation from the constraint equation system. The superscript i of p i j represents the sequence of the camera, where 1�i�n, and n denotes the number of the cameras in the vision system. The subscript j of p i j represents the plane sequence that exist in the view field of camera i, where 1�j�m i , and m i denotes the number of the planes exist in view field of camera i. ðp i j Þ G represents the plane p i j expressed in the ground coordinate system. ðp i j Þ C represents the plane p i j expressed in the camera coordinate system. ðT G R ðx; y; φÞÞ represents the homogenous transformation matrix from robot coordinate system to the ground coordinate system. T R C i denotes the homogenous transformation matrix from the camera i to the robot coordinate system.
According to the constraint equation system shown in Eq (7), a plane, p i j , might be observed by more than one camera at the same time. In other words, the plane p i j with different identification parameters, i, j, may represent the same plane in a real-world environment. However, the same plane observed by different cameras provides different constraints. These additional constraints help to generate more precise estimations of position and pose.
Two main challenges arise when solving the constraint equation system. The first is how to establish the accurate connections between real-time observed planes and a-priori known plane parameters. The second is how to find the optimal solution for the constraint equation system. The remainder of this article will discuss these two problems separately.

Overview of proposed method
In a scene with multiple planes that lack distinctive features, the primary challenge when localizing a mobile robot lies in identifying and distinguishing these planes within the current field of view. To address this issue, the proposed method for localizing mobile robots based on apriori map information combines the robot's current coarse pose estimate with the map's apriori information. This enables the position estimation of the planes used for localization within the field of view, allowing for the matching of observed planes with their corresponding a-priori map planes. Each camera's online identification of a plane and its mapping relationship in the map constitute a pose constraint for the mobile robot.
When the number of planes in the field of view is redundant, an optimal estimation of the robot's pose is achieved through a weighted least squares approach. This approach takes into account the uncertainty of plane identification as the weight and ensures accurate and robust pose estimation.
The flowchart of the proposed method is shown in Fig 2. As shown in Fig 2, the a-priori planes expressed by their vertices represent the information of the a-priori known map, all of which are represented in the ground coordinate system. The a-priori planes can be obtained from the building blueprint in BIM or from the previous map constitution. The coarse transformation matrix between the camera coordinate system and the ground coordinate system is known according to the coarse initial pose. An extended image coordinate system is set up to express invisible points in the camera coordinate system. The ROIs corresponding to each theoretically visible plane are determined using the coarse initial pose and a-priori planes.
Using the obtained ROIs and the instantaneous point cloud online captured by the cameras, the entire point cloud is segmented into each plane suitable for localization. After segmentation using the ROIs, the parameters of each plane are identified by the random sample consensus (RANSAC) method. These identified parameters form the constraint plane equation system, where the unknown variables of the equation system represent the position and pose. The constraint equation system is solved to estimate the mobile robot's position and pose. If the size of the equation system is more than two, a weighted least squares method is used to obtain the optimal estimation, ensuring accurate and robust pose estimation for the mobile robot.

Identifying localization reference planes using the extended image coordinate system
In conventional image plane coordinate systems, object representation is confined to the field of view. To enable the depiction of arbitrary spatial objects and conveniently ascertain a plane's visibility within the field of view, an extended image coordinate system has been devised.
Consider a spatial point, P(x,y,z), within the camera coordinate system. The components of coordinates in the image coordinates are depicted in Eq (8) (x6 ¼0,y6 ¼0).
where θ h and θ v signifying the camera's horizontal and vertical field of view, respectively, and (u,v) denoting the coordinates within the extended image coordinates system. If x = 0,y6 ¼0, Eq (8) is supplanted by Eq (9).
8 > > > < > > > : The value domain of all points within the camera coordinate system after transformation is expressed in Eq (13).
Upon mapping each point from the camera coordinate system to the extended image coordinate system, the comprehensive extended coordinate system comprises three regions, as illustrated in These three regions in the extended image coordinate system correspond to those in the camera coordinate system. Fig 4(A) and 4(B) depict the distribution of these regions from the planes y = 0 and x = 0, respectively.
A point within the VFR of the extended image coordinate system is visible. However, a plane's visibility does not necessitate the visibility of its vertices. The most basic unit of the plane primitive is the triangle, exemplified by vertices P 1 (x 1 ,y 1 ,z 1 ), P 2 (x 2 ,y 2 ,z 2 ), and P 3 (x 3 ,y 3 ,z 3 ), which represent a-priori known plane for localization, as displayed in Fig 5. The reference plane vertices' coordinates are calculated based on the robot's coarse initial pose and position. Each triangle vertex must reside within one of the extended image coordinate system's regions. Fig 5 demonstrates that as long as a portion of the a-priori plane is visible, the corresponding point cloud within the field of view can be extracted for localization. By assessing the closed shape vertices' presence within the view region, the visibility issue can be partially resolved. Ten potential region distribution scenarios exist, and the objective of differentiating the triangle vertices' area distribution within the extended image coordinate system is to expediently determine whether the bounded plane is partially situated within the camera field of view. The intersection status check table is presented in Table 1.
Seven out of the ten possible distribution scenarios can be ascertained directly, while the remaining three necessitate further calculations to verify their visibility. The planes composing the view quadrangular pyramid, as shown in Fig 5, articulated by Eq (14).

PLOS ONE
Localization on a-priori information of plane extraction The prerequisite for the plane primitive P 1 P 2 P 3 visibility is that one of the segments P i P j intersects with any boundary plane, and the intersection point between the segments and the boundary plane should be situated at the VFR within the extended image coordinate system. The status check table aids in reducing computational expense. If the triangle employed for robot localization is absent from the field of view, further intersection calculations are rendered unnecessary.

Estimating position and pose from the constraint equation system
Each visible plane restricts the mobile robot's position and orientation to a line on the ground, with this constraint line being parallel to the intersection of the constraint plane and the ground. The constraint plane and the ground line are depicted in Fig 6. As illustrated in Fig 6, the constraint plane intersects the ground, and the constraint line, parallel to the intersection line, is rendered in red. The plane equation, π R , represented in the robot coordinate system is conveyed in Eq (15).
where p R 1 ; p R 2 ; p R 3 ; p R 4 represent the coefficients of the plane π R . And the plane equation, π G , represented in the ground coordinate system is expressed in Eq (16) represent the coefficients of the plane π G . The distance between the mobile robot's center and the plane within both coordinate systems remains constant. In the ground coordinate system, this distance is expressed in Eq (17).
j ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where d G symbolizes the distance between the target plane and mobile robot's center.
In the robot coordinate system, this distance is conveyed in Eq (18) d R ¼ jp R 4 j ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi where d R represents the distance between the target plane and mobile robot's center. Given that the distance invariance illustrated in Eq (19).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Eq (20) represents the position constraint for the mobile robot on the ground. Based on the ideal plane's mathematical analysis, the absolute value in Formula (20) renders the Equation subject to both positive and negative possibilities. In other words, the mobile robot can only be situated on one side of the plane or the other. In reality, for any given plane, only one side of the plane surface can be visible, meaning that only one scenario will be considered for the actual observable plane based on stereo vision, as displayed in Fig 7. Consequently, the solution for the absolute value Eq (20) is singular.
where the coefficient a is ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi and the coefficient c is ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The parameter δ + represents the plane's front direction symbol, as discussed in Fig 7. The definition of the direction parameter δ + is related to the plane's function. When p G 4 > 0 and p R 4 > 0, and for the plane represented in the ground coordinate system, if the origin point is situated on the observable side, then δ + = 1. Otherwise, if the origin point is positioned on the unobservable side, then δ + = −1.
The multiple observed planes captured by the camera system comprise the constraint set, as demonstrated in Eq (25).
If there is only one equation, the mobile robot's position is constrained to a line. If there are two equations, the robot's position can be calculated using a quadratic system of equations. When there are more than two equations, the mobile robot's exact position can be determined through a regression function. Assuming that the x-coordinate and y-coordinate of the mobile robot are independent and the estimations conform to a normal distribution, the least squares method can be employed to obtain an unbiased set of estimates, ðx;ŷÞ. The position estimation regression statistical model is presented in Eq (26).
In Eq (26), ε i (x,y) represents the statistic sample error, represented by the distance between the estimated coordinates to the constraint line. This equation is valid, as accurate observations should result in all constraint lines intersecting at a single point ðx;ŷÞ rendering the distance, d i (x,y), defined in Eq (26) as zero. The distance error of estimation ε i (x,y) is considered to follow a normal distribution, ε i � Nð0; s 2 i Þ. The variance of the distribution relies on the quantity of plane observation.
According to the least square method principle, the evaluation function can be expressed as Eq (27).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The partial derivative of Eq (27) should equal zero, as shown in Eq (28).
The solution of Eq (29) is shown in Eq (30) where s 2 d;i represents the distance observation variance for plane i, which is influenced by various factors. In this case, the variance is considered to have an inversely proportional relationship depicted in Eq (31).
where N represents the point cloud size characterizing the observed plane. And s 2 P;d represents the variance of a single point distance measurement. A larger point cloud size results in a smaller variance.
The pose angle estimation process resembles the mobile robot position estimation. The pose angle is defined as the included angle, φ, between the normal vector of plane π G projected onto the plane of xOy, and the normal vector π R projected onto the plane of xOy. The included angle adheres to the Eq (32).
and the included angle can be expressed as (33).
According to the observation Eq (33), a single plane constraint suffices for pose angle estimation, whereas position estimation necessitates two plane constraints.
The precondition of Eq (33) is that the plane π G is not parallel to the ground, which corresponds to the xOy plane of ground coordinate system. This condition is relatively simple to satisfy in actual mobile robot localization scenario. Firstly, plane observation errors ensure that the statistical probability of measuring a ground-parallel plane is zero. Secondly, mobile robot cameras are typically mounted on the robot's side, making it nearly impossible to observe a ground-parallel plane unless the line of sight is directed upwards. Lastly, prior information can be used to eliminate ground-parallel planes.
The weighted error sum of the square equation is presented in Eq (34).
The least squares estimation follows that The weighted coefficient for the pose angle estimation is The normal Equation for the least square estimation is Upon organizing Eq (37), the weighted least square estimation for the pose angle is depicted in Eq (38) φ ¼

Prototype development and workspace configuration
The prototype's appearance is displayed in Fig 8(A). Mobile robot consists of vision, motion, and control systems. The vision system includes two Intel RealSense D435i RGB-D cameras, while the motion system is an omnidirectional mobile chassis primarily composed of four Mecanum wheels, as shown in Fig 8(B). The central control system mainly consists of NVIDIA Jetson Nano modules. The structure of the mobile robot system is illustrated in Fig 8(C), where blue blocks represent hardware components, the red blocks represent software components, and the green blocks represent data flow components. MATLAB is used for the analysis of the experimental result's initial data and figure drawing.
The symmetrically arranged Mecanum wheels enable multidirectional movement of the chassis, with encoder feedback enhancing movement precision. The motion system communicates with the control system via a standard serial port communication protocol.
The control system receives external movement requirements specifying the macroscopic motion of the mobile robot. The motion target management module processes these requirements into motion paths and forwards them to the motion trajectory control module. This module calculates specific linear and rotational motion parameters for the current motion stage and transmits them to the serial communication module in a formatted structure. The serial communication module sends motion commands to the motion system in accordance with the serial communication protocol and monitors the chassis's movement.
The control system's pivotal module is the ROI generation server. It iterates through all cameras and structured planes based on initial estimated coordinates, calculating filter ROIs from the theoretically visible plane set. The vision system filters point clouds for each plane, constraining the mobile robot's position and pose. Each filtered ROI is sent to the plane recognition server, which establishes the constraint equation system.
The vision system identifies parameters of each constraint plane within its field of view and transmits the constraint equation system to the multi-plane positioning module. In conjunction with the known structured plane set, position and pose constraints for the mobile robot are obtained.
Eqs (39) and (40) display the homogeneous transformation matrix from the two camera coordinate systems to the robot coordinate system. Eq (39) exemplifies the homogeneous The first camera's non-unit rotation matrix in the top-left transformation matrix results from the z-axis in the camera coordinate system pointing directly forward, and the camera coordinate system's origin being situated on the camera's left infrared sensing element, causing a 17.5mm y-direction offset.
Eq (40) presents the homogenous transformation for the second camera.
The mobile robot test site is encircled by 300mm-high walls, forming a 1500 mm by 1500 mm area, as demonstrated in Fig 9. The experimental workspace simulates a building's wall, which the mobile robot employs for indoor localization.  Table 2 contains the structural parameters of each plane within the ground coordinate system. The ground coordinate system's xOy plane coincides with the surface on which the mobile robot operates. Similarly, the origin of the robot coordinate system is situated on the same plane where the mobile robot moves.

Robustness testing of proposed method with various initial position and pose
Exist difference between the mobile robot's estimated initial coordinates and the actual coordinates, potentially lead to some points in the point cloud not representing the relevant plane. Nevertheless, the RANSAC method exhibits high tolerance for outliers in the point cloud, ensuring minimal impact on plane parameter identification results even with numerous outlier points within the filter ROI. This experimental design aims to verify this robustness.
The mobile robot's center is positioned at coordinates (500, 500), with its direction vector represented in the ground coordinate system as (-1, 0) and its pose angle as π. The pose angle is defined as the angle between the mobile robot's direction vector and the ground coordinate system's x-axis.
Initially, the mobile robot is placed facing the target plane, with the target plane's normal vector parallel to the camera's z-axis. The x-coordinate, y-coordinate, and pose angle values are adjusted around the actual value.
Subsequently, an initial pose angle error of π/6 is manually added on the actual pose angle. The actual pose angle of the mobile robot remains 5π/6, and the assumed pose and position variations are consistent with the first set of experiments.  While keeping the robot stationary, the initial estimated position and pose are adjusted. Two primary indicators are evaluated to assess the impact on plane identification: the size of the point cloud filtered by the calculated ROI and the successful identification of the corresponding plane within the point cloud. To ensure point cloud size accuracy, no down-sampling is performed during pre-processing.
The assumed initial x-coordinate of the mobile robot ranges from 200mm to 1200mm. Fig  10(A) illustrates the variation of the filtered ROI within the extended image coordinate system, while Fig 10(B) displays the curve of the number of fitted plane points in the point cloud obtained by filter ROI as the initial x-coordinate.
As shown in Fig 10(A), the black rectangle represents the boundary of the view field region (VFR), while the red rectangles indicate the filtered ROIs of the corresponding planes. Larger line widths signify greater assumed initial distances between the camera and the target plane.
The green vertical line in Fig 10(B) corresponds to the actual x-coordinate of the robot. As the assumed initial distance between the mobile robot and the plane increases, the filter ROI area and the number of plane fitting points obtained after point cloud screening decrease. However, the identification remains successful even as the assumed distance between the camera and the target plane increases from 300mm to 1500mm. A larger filtered ROI does not always correspond to a more extensive plane point cloud, as the ROIs already contain all the point clouds representing the plane within the field of view. Fig 11(A) demonstrates the geometric changes in the filtered ROI as the assumed initial ycoordinate varies within the extended image coordinate system, and Fig 11(B) presents the curve of fitted plane points as the initial y-coordinate increases from 200mm to 1200mm.
As shown in Fig 11(A), varying the assumed initial y-coordinate influences the horizontal location of the rectangle ROI. When the assumed initial y-coordinate is equal to the actual ycoordinate, the number of fitted plane points reaches its maximum. As the assumed initial ycoordinate increases, the actual number of identified plane points first increases and then decreases. Fig 12(A) reveals the geometric changes in the filtered ROI as the assumed initial pose varies within the extended image coordinate system, and Fig 12(B) shows the fitted plane points number curve as the initial pose angle increases from 2π/3 to 4π/3. The assumed initial coordinates of mobile robot are (500, 500). The plane fitting result in Fig 12(B) indicate that the number of fitted plane points initially increases and then decreases as the pose angle increases. When the camera faces directly toward the target plane, the point cloud reaches its maximum.
When the camera's z-axis is parallel to the plane's normal vector, the initial position and pose accuracy requirements are not stringent. As seen that in Figs 10-12, even when the assumed initial position and pose undergo significant changes, the target planes can be correctly identified.
In the following experiments, the actual pose angle of the mobile robot is changed, and the camera's forward direction is not parallel to the normal vector of the plane. The center of the mobile robot is placed at coordinates (500, 500), and its direction vector in the ground =2Þ. The actual pose angle for the mobile robot is approximately 5π/6. The supposed initial pose is still set as π. Fig 13(A) demonstrates the geometric changes in the filtered ROI as the supposed initial xcoordinate varies within the extended image coordinate system. Fig 13(B) presents the curve of fitted plane points as the initial x-coordinate increases from 200mm to 1200mm. The number of fitted plane points decreases as the assumed initial x-coordinate increases, but in all experiments, the target planes are correctly identified. Fig 14(A) displays the geometric changes in the filtered ROI as the assumed initial y-coordinate varies within the extended image coordinate system, and Fig 14(B) shows the curve of fitted plane points as the initial y-coordinate increases from 200mm to 1200mm.
The red segments in Fig 14(B) represent the incorrect identification results of the target plane. As the supposed initial y-coordinate increases, the filter ROI rectangle moves left in the extended image coordinate system. However, the plane on the left in the actual view field is not the target plane.  illustrates the geometric changes in the filtered ROI as the assumed initial y-coordinate varies within the extended image coordinate system, while Fig 15(B) shows the fitted plane points number curve with increasing the supposed initial pose angle from 2π/3 to 4π/3. The wrong plane represented by red lines is fitted because it occupies a larger area in the camera's view field. However, as long as the error in the initial pose angle is not too significant, the proposed method can identify the correct plane.
These experiments validate the proposed method based on RANSAC plane fitting and prior plane information. The point cloud of the plane used for localization filtered by ROI allows for a certain degree of initial position and pose error. As long as the estimated initial position and pose of the mobile robot are not too different from the actual pose, the filtered ROI can successfully identify the plane.

Localization validation experiment
An experiment is conducted to ascertain the efficacy of the localization method using scene prior information for visual path tracking in a mobile robot navigation application. The experiment utilizes the former introduced existing prototype and testing environment, where target points are randomly arranged throughout the workspace. The mobile robot starts at the coordinate (0, 1000) and sequentially passes through every target point before ultimately arriving at the coordinate (1000, 0). The sequence of target points is illustrated in Fig 16. Throughout the pathway execution, the mobile robot employs the aforementioned localization method, which integrates the a-priori information of the environment. At each moment, the robot uses its theoretical pose and position coordinates as initial inputs to calculate the ROI parameters. The obtained ROI parameters are then utilized to filter the plane point clouds within the field of view. The robot subsequently matches the filtered point clouds with the corresponding planes in the a-priori map, which in turn allows for the localization of the mobile robot. Fig 17 exhibits the scene point clouds captured by the forward-facing camera during the path tracking process. The figure also illustrates the plane point clouds obtained through segmentation and plane fitting, which served localization purposes. As seen in Fig 17, there are instances where the forward-facing camera can only observe one plane. However, with the help of side-mounted cameras, the system can consistently observe two or more planes in the process of path tracing. By comparing the localization errors before and after implementing the visual tracking navigation method with scene prior information, it is found that using the a-priori plane information for localization significantly reduces the motion error. This improvement suggests that the proposed method can effectively perform the localization tasks associated with featureless or repetitive environments, leading to better overall performance.
Specific experimental results, using the point cloud segmentation method based on a-priori plane, are illustrated in Fig 18. For the comparison experiment, the path presented in Fig 16 is  executed with and without visual feedback, and each experiment is repeated 15 times. The average coordinates for each path and position are calculated and compared to the ideal coordinates to determine the offset distance between the actual and ideal positions. The actual position of the mobile robot is accurately captured using a motion capture system with a precision better than 1mm, which can be considered as the ground truth compared to the vision system. Since the first point is manually positioned, the error for this point is insignificant, and the comparison starts from the second point. The numerical data is available by S1 Table. In the figure above, the localization errors at various points along the path for both cases, with and without visual feedback, are presented. It is evident that the motion error accumulates when visual feedback is not utilized. The motion error is reduced when visual feedback is employed. The comparative analysis of the experiment results is shown in Table 3.

Discussion
A-priori planes may be acquired, for instance, from a building's blueprint in BIM or from a pre-existing map in SLAM. These planes are represented by their vertices. As length or height increases, some vertices and edges of the a-priori plane may vanish from the current perspective, rendering only a partial area visible. Nevertheless, vertices beyond the field of view can be arranged in the suggested extended image coordinate system. Similar to several other approaches, the localization procedure necessitates an initial pose as a starting parameter for the algorithm [36,40,41]. Contrasting with the plane extraction technique in [27,28], the method proposed herein capitalizes on known map information to extract plane point clouds. The method introduced in [30] should meet the criteria that planes be parallel or perpendicular to one another, a strict criterion that restricts localization application. Within numerous well-designed structures, corridors are not always aligned horizontally and vertically. The proposed method in this article provides the analytical solution for localization with planes of any pose. Moreover, the number of planes is not limited. The pose and position estimations are based on the weighted least squares. If one observed plane contains more points, the weight of this plane is greater.
Localization precision in the proposed method is contingent solely upon camera accuracy. Provided that planes can be accurately matched, localization accuracy is influenced only by the camera or other visual systems' precision. Plane matching correctness relies on map complexity and historical motion patterns, not camera accuracy-a notable advantage over Schaub et al. [36]. In instances where two planes exhibiting similar angles and distances occupy the field of view, identification errors might arise. However, real-world maps typically feature walls with perpendicular orientations and significant differences, making matching failures improbable. The a-priori information form employed in this method is succinct, necessitating only vertices to represent planes. This representation conserves storage and computational resources.
The proposed approach emphasizes real-time data processing for localization, achieving enhanced pose estimation results by leveraging historical data. Consequently, the indoor environment is intrinsically linked to the application of these methods, as flattened planes utilized for localization typically occur indoors. This method's characteristics encompass the need for initial position and orientation, and the mobile robot's motion error must not be excessive. Should motion error between two localizations be too great, the input initial pose deviation for subsequent observation will be substantial, resulting in imprecise ROI calculation. Consequently, point cloud segmentation of the corresponding plane will be inaccurate, leading to localization errors. Ultimately, the tilt angles between localization planes within the scene should exhibit significant disparities to avert matching errors.
Our proposed method leverages prior information to segment partial point clouds from the field of view, which provides a significant advantage when dealing with some distorted point clouds in view. By focusing on the segmented relevant point clouds, the method remains unaffected by the distortion and maintains its localization effectiveness. This approach is particularly suitable for environments with complex or irregular geometries, where distortion in point cloud data is more likely to occur.
With the features talked above, the proposed method achieves robot localization in weakfeature and repetitive environments by capitalizing on prior plane information. This approach is well-suited for scenarios in which mobile robots adhere to predetermined paths during movement. If integrated with other localization methods, there is potential for realizing even more sophisticated and intelligent mobile robot localization.

Conclusion
This article introduces a mobile robot localization technique with a-priori planes. Initially, given the finite nature of planes in reality, they are represented by the vertices of planar closed shapes. All vertices in space are projected onto the proposed extended image coordinate system to ascertain whether the plane theoretically exists within the camera's field of view. The theoretical visible vertices compose the filter ROI, which serves as a preliminary selection of points representing the target plane. Subsequently, the RANSAC method is employed to identify plane parameters. The filter ROI calculation is also contingent upon an initial known position and pose, which, however, need not be numerically precise.
Initially, the identified planes are represented in the camera coordinate system. A constraint equation system is established by integrating the plane equation in the camera coordinate system with the prior known plane in the ground coordinate system. This constraint equation system delineates the position and pose constraints for the mobile robot. When the constraint equation count exceeds two, weighted least square estimation is applied to determine the optimal position and pose.
Lastly, the redundancy of the initial known position and pose error is assessed based on the mobile robot prototype. Experimental results indicate that a certain degree of initial position and pose error does not impact plane identification outcomes. The proposed localization method is further validated during the pathway tracking experiment's localization process. A high-precision motion capture system is employed to reflect the mobile robot vision system's localization results.
Looking ahead, the integration of multi-source visual perception approaches will be further explored to achieve effective mobile robot localization in more complex scenarios.